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Abstract. Structuring theories is one of the main approaches to re¬ 
duce the combinatorial explosion associated with reasoning and explor¬ 
ing large theories. In the past we developed the notion of development 
graphs as a means to represent and maintain structured theories. In 
this paper we present a methodology and a resulting implementation to 
reveal the hidden structure of flat theories by transforming them into de¬ 
tailed development graphs. We review our approach using plain TSTP- 
representations of MIZAR articles obtaining more structured and also 
more concise theories. 


1 Introduction 

It has been long recognized that the modularity of specifications is an indis¬ 
pensable prerequisite for an efficient reasoning in complex domains. Algebraic 
specification techniques provide appropriate frameworks for structuring complex 
specifications and the authors introduced the notion of an development graph 
nmn as a technical means to work with and reason about such structured 
specifications. While its use presupposes the development of theories having the 
intended structures already in mind, there are various applications of Formal 
Methods in which theories are automatically generated in an entirely unstruc¬ 
tured representation. Thus, there is a need for a computer-aided structure for¬ 
mation for large theories, which allows for an efficient reasoning in such theories. 

In this paper we present an initial approach to support structure formations 
in large unstructured specifications. The idea is to provide a calculus and a corre¬ 
sponding methodology to crystalize intrinsic structures hidden in a specification 
and represent them explicitly in terms of development graphs. Step by step, the 
specification is split into different nodes resulting in increasingly richer develop¬ 
ment graphs. On the opposite, common concepts that are scattered in different 
specifications are identified and unified in a common theory. 

We start with a discussion on syntactical properties to measure the appropri¬ 
ateness of a structuring and specify invariants underlying a structure formation 
process. Based on this general framework we present a calculus (and heuristics 
to guide this calculus) to transform development graphs in order to enrich the 
explicitly given structure. We review our framework with the help of the Mizar 
Mathematical Library (http: //www. mizar. org/) providing hundreds of articles 
which are subject to our structure formation process. 

* The final publication is available at http://link.springer.com as part of the proceed¬ 
ings of the Conference on Intelligent Computer Mathematics 2015. 
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2 Development Graphs for Structure Formation 

We base our framework on the notions of development graphs (and thus on the 
notion of institutions i) to specify and reason about structured specifications. 
Development graphs V are acyclic, directed graphs (J\f, C), the nodes A f denote 
individual theories and the links C indicate theory inclusions with respect to 
signature morphisms attached to the links. Each node TV £ Af of the graph is 
a tuple (sig 1 * , ax 1 *, lem N ) such that sig N is called the local signature of TV, ax 1 * 
a set of local axioms of TV, and lem N a set of local lemmas of TV. C is a set of 
global definition links M ° > TV. Each link imports the mapped theory of M 
(by the signature morphism a) as part of the theory of TV. A node TV is globally 

reachable from a node M via a signature morphism <j,T>\- M ) a > TV for short, 

iff 1. either M = TV and er = id , or 2. M ° > I\ £ C, and V h K~) a > TV, 
with a = o" o o'. The global signature (global axioms and global lemmata, 
respectively) of a node TV £ Af is the union of its local signature (local axioms 
and local lemmata) and the mapped global signatures of all nodes from which 
TV is globally reachable. A node is valid if all signature symbols occurring in its 
global axioms and lemmata are declared in its global signature. A development 
graph is well-defined, if all its nodes are valid. 

The maximal nodes (root nodes) \TL\ of a graph T> are all nodes without 
outgoing links. Domx>(N) := Sig v (N ) U Axx>(N) U Lemx>(N ) is the set of all 
signature symbols, axioms and lemmata visible in a node N. The local domain 
of TV, dom N := sig N U ax N U lem N is the set of all local signature symbols, 
axioms and lemmata of TV. The imported domain Imports-D^N) of TV in T> is 
the set of all signature symbols, axioms and lemmata imported via incoming 
definition links. Domx> = Uwgv Domx>(N) is the set of all signature symbols, 
axioms and lemmata occurring in V. Analogously we define Sig v , Axx>, and 
Lemx>■ Dom^x>~\ = Uver®] Domx>{N) is the set of all signature symbols, axioms 
and lemmata occurring in the maximal nodes of D. 

Given a node TV £ Af its associated class Mod x ’(TV) of models (or TV-models 
for short) consists of those Sig v (TV)-models n for which (i) n satisfies the local 
axioms aaA, and (ii) for each K > TV £ S, n\ a is a AT-model. In the following 
we denote the class of A-models that fulfill the 17-sentences \P by Modv'(if - ). 

Given a signature S and Ax, Lem C Sen(i7), a support mapping Supp for 
Ax and Lem assigns each lemma ip £ Lem a subset H C Ax U Lem such that 
(i) Mod{ sl , TO (#) Us j|= <pQ(ii) The relation CC (AxULem) x Lem with 
d> C ip <t=> (<£ £ Supp((p) V 3ip.<P £ Supp(ip) A ^ C p) is a well-founded strict 
partial order. If V is a development graph, then a support mapping Supp is a 
support mapping for V iff for all TV £ I? Supp is a support mapping for Axx> (TV) 
and Lemxi(N). 

We will now formalize the requirements on development graphs that reflect 
our intuition of an appropriate structuring for formal specifications in the fol¬ 
lowing principles. 


i 


where (S)s denotes the smallest valid sub-signature of E containing S. 
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The first principle is semantic appropriateness, saying that the structure of 
the development graph should be a syntactical reflection of the relations be¬ 
tween the various concepts in our specification. This means that different basic 
specifications are located in different nodes of the graph and the links of the 
graph reflect the logical relations between these specifications. The second prin¬ 
ciple is closure saying, for instance, that deduced knowledge should be located 
close to the axioms guaranteeing the proofs. Also the specification defined by 
the theory of an individual node of a development graph should have a meaning 
of its own and provide some source of deduced knowledge. The third principle 
is minimality saying that each concept (or part of it) is only represented once 
in the graph. When splitting a monolithic theory into different theories common 
foundations for these theories should be (syntactically) shared between them by 
being located at a unique node of the graph. 

We now translate these principles into syntactical criteria on development 
graphs and into procedures of how to transform or refactor development graphs. 
In a first step we formalize technical requirements to enforce the minimality- 
principle in terms of development graphs. Technically, we demand that each 
signature symbol, each axiom and each lemma has a unique location in the 
development graph. When we enrich a development graph with more structure 
we forbid to have multiple copies of the same definition in different nodes. We 
therefore require that we can identify for a given signature entry, axiom or lemma 
a minimal theory in a development graph and that this minimal theory is unique. 
We define: 

Definition 1 (Providing Nodes). Let (Af,C) be a development graph. An 

entity e is provided in N £ Af iff e £ Dom//j t c) (N) an d V M a > N. e $ 
Domnj-c){M). Furthermore, 

1. e is locally provided in N iff additionally e £ dom N holds. 

2. e is provided by a link l : M > N iff e is not locally provide in N and 
3e' £ Dom^jy^ffM). a(e') = e holds. In this case we say that l provides e 
from e'. e is exclusively provided by l iff e is not provided by any other link 

V £ C. 

The closure-principle demands that there are no spurious nodes in the graph not 
contributing anything new. We combine these requirements into the notion of 
location mappings: 

Definition 2 (Location Mappings). Let T> = (Af, C) be a development graph. 
A mapping locx> : Dom-p —► Af is a location mapping for V iff 

1. loc-p is surjective (closure) 

2. \/N £ Af. ye £ dom N . loc v (e) = N 

3. Ve £ Domx>. locx>(e) is the only node providing e (minimality) 

For a given locx> we define locjff : Af —> 2 Dom ' D by 

locjff(N) := {e £ Domj)\locxi(e) = N}. 

We write loc and loc~ x instead of locv and locfff ifV is clear from the context. 

Based on the notion of location mappings we formalize our intuition of a 
structuring. The idea is that the notion of being a structuring constitutes the 
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invariant of the structure formation process and guarantees both, requirements 
imposed by the minimality-principle as well as basic conditions on a development 
graph to reflect a given formal specification. 

Definition 3 (Structuring). Let V = (A f,C) be a valid development graph, 
loc : Domv —> Af, £ £ |Sign|, Ax, Lem C Sen(Z') and Supp be a support 
mapping for V. Then (' D, loc, Supp) is a structuring of (£, Ax, Lem) iff 

1. loc is a location mapping for V. 

2. let Dom^x>'\ — Ad' U Ax' U Lem then £ = £', Ax = Ax and Lem C Lem . 

3. M<j) £ Lem-D ■ Vif £ Supp(<f). 3er. loc(if) a —> loc{(j>) A crfif) = if 

3 Refactoring Rules 

In the following we present the transformation rules on development graphs that 
transform a structuring again into a structuring. Using these rules we are able to 
structure the initially trivial development graph consisting of exactly one node 
that comprises all given concepts step by step. This initial development graph 
consisting of exactly one node satisfies the condition of a structuring provided 
that we have an appropriate support mapping at hand. 

We define four types of structuring-invariant transformations: (i) horizontal 
splitting and merging of development graph nodes, (ii) vertical splitting and 
merging of development graph nodes, (iii) factorization and multiplication of 
development graph nodes, and (iv) removal and insertion of specific links. Split¬ 
ting and merging as well as factorization and multiplication are dual operations. 
For lack of space and because we are mainly interested in rules increasing the 
structure of a development graph we will omit the formal specification of the 
merging and multiplication rules here. 

Horizontal Split. The first refactoring rule aims at the separation of specifications 
in independent theories. In terms of the development graph a node is replaced 
by a series of independent nodes; each of them contains a distinct part from a 
partitioning of the specification of the original node. In order to ensure a valid 
new development graph, each of the new nodes imports the same theories as the 
old node and contributes to the same theories as the old node did. To formalize 
this rule we need constraints on how to split a specification in different chunks 
such that local lemmata are always located in a node which provides also the 
necessary axioms and lemmata to prove it. 

Definition 4. Let S = (V, loc, Supp) be a structuring of (£, Ax, Lem) and N £ 
Afx>- A partitioning V for N is a set {Ni,..., Nk} with k > 1 such that 1. sig N = 
sig Nl l±l ... I±l sig Nk , ax N = ax Nl l±l ... I±l ax Nk , lem N = lem Nl l±l ... W lem Nk 
2. sig Ni U ax Ni U lem Ni ^ 0 fori = 1 ,... ,k. A node Ni £V is lemma independent 
iff Supp(if) fl (ar^ U lem N ) C (aar^ U lem Nt ) for all if £ lem N \ 

Definition 5 (Horizontal Split). Let S = ((TV, C), loc, Supp) be a structuring 
of (£, Ax, Lem), V = {N\,..., Nk} be a partitioning for some node N £ N such 
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Fig. 1 . Horizontal Split and Merge 


that each Ni G V is lemma independent and loc 1 (N) = dom N . The horizontal 
split of S wrt. N and V is S' = (V, loc', Supp) with V = (AT ,£) where 

1. AT' :={N 1 ,...,N k }U(M\N) 

2. C := { M =^=> M' G C\M ^ N A M' ± N} 

U { M == Ni | M =^=> N G C, i G {1,..., k}} 

T \ Dom Na t 

U { Ni ==> M | N M G C, i G {1,..., k}} 

3. loc (e) := Ni if e G dom Ni for some i G {l,...,fc} and loc (e) := loc(e) 
otherwise. 

such that Sig v ,(Ni) are valid signatures and a%i,lemi C Sen(Sig-pi (Ni)) for 
i = l,...,k. 

Vertical Split. Similar to a horizontal split we introduce a vertical split which 
divides a node into two nodes and locates one node on top of the other. While 
all outgoing links start at the top node, we are free to reallocate incoming links 
to either node. 

Definition 6 (Vertical Split). Let S = ((Af , C ), loc, Supp) be a structuring 
of (U, Ax, Lem) and V = {V 1; N 2 } be a partitioning for some N G Af such 
that N 1 is lemma independent. Then, the vertical split S wrt. N and V is S' = 
[V, loc , Supp) with V = {AT ,£) where 

AT :={N 1 ,N 2 }^{Af\ N) 

£ :={ M —^ M’ G C\M ± N A M' ± N} U {iVi > iV 2 } 

U { M —^ Vr | M —^ V G £} U {7V 2 M \ N M G £} 

{ V 2 */ loc(e) = N and e G Domp^Nf) 

Ni if loc{e ) = iV and e Domp'{N 2 ) 

loc{e) otherwise 

such that Sig-pi(Ni), i = 1,2, are valid signatures and axi , lem,; C Sen^zg-p/ (TV,;)), 
z = 1, 2. Conversely, S is a vertical merge of Ni and N 2 in S'. 
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Fig. 2. Vertical Split and Merge 


Example 1. We illustrate the horizontal and vertical split rules by considering a 
single theory axiomatizing a Field with binary operations + and x consisting of 
a Distributivity axiom (<?£> := \/x, y, z.x x (y + z) = x x y + x x z) and the axioms 
of an Abelian Group for + and x, respectively (fL\ G := Vx,y, z. x +(y + z) = 
(x + y) + z,Vx,y. x + y = y + x,Vx. x + 0 = x,Vx.x+-(x ) = 0 and d>^ G := 
Vx, y, z . x x (y x z) = (x x y) x z, \/x, y .x x y = y x x,\/x . x x 1 = x,\/x. x x inv(ai 
1). Assume axioms are contained in a single node Field, which forms a trivial 
structuring. In a first step we can split that node vertically by separating the 
distributivity axiom from the other axioms. In a second step we can separate 
the Abelian Group axioms for + and x by a horizontal split. This is shown in 
the following Figure: 



Factorization. The factorization rule allows one to merge equivalent specifica¬ 
tions into a single generalized specification and then to represent the individual 
ones as instantiations of the generalized specification. A precondition of this rule 
is that all individual specifications inherit the same (underlying) theories. 

Definition 7 (Factorization). Let S = ((Af,£),loc,Supp) be a structuring of 
(E, Ax , Lem). Let K \,..., K n , Mi,..., M p £ J\f with p > 1 such that sig Mj U 

axA Ij ^ 0 and dcrjj. Ki ‘ J > Mj £ C for i = 1,... ,n,j = 1,... ,p. 

Suppose there are sets sig, ax and lem with (sigU ax U lem) n Domx> = 0 and 
signature morphisms 9 1 ,... ,d p and ui,..., a n such that 
- Ve £ DomT>{Ki). 9j(cri(e)) = <Tjj(e) and <Jij(e ) = e V aij(e) (jL Domx> 
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Fig. 3. Factorization (with <nj := 9j o cr;) 


- sig Mj C 9j(sig) C Domx>{Mj), ax Mj C 9j{ax) C Domx>{Mj) 

- Ve £ fern holds 31 £ {1,.. .p}. 0/(e) £ lem AIt , 9i(e) = 9j(e) implies i = j and 
9j(e) £ Dom-D implies loc(9j(e)) £ Mj 

- there is a support mapping Supp N for axU (J i=1 n ^i{Domx>(Ki)) and lem. 
Then S' = ((J\floc , Supp) is a factorization of S wrt. Mi,..., M p and 
Supp N iff 

M’ :={7V} U {Nj\j £{l,...p}}UAf\{M 1 ,... M p } 

with N = (sig, ax, lem), Nj = (0, 0, lem Mj \ 9j(lem)) 

C! :={ K K' £ C\K, K' ${M U ... M p } 

U { Ki ai > N\ Ki ===> Mj,j £ {1 ,...p},i £ {1,... n}} 
U{N=?L > N j \j €{l,...p}} 

U { K —^> Nf K Mj A (Vi £ {1,... n}.K /ifjAr / a id ) 

U {Nj ===> A'| M 3 =^> K £ £, j £ {1,...p}} 
f TV if x £ DomT>'(N)\\J i=1 n DomT>'{Ki) 

loc [x) := < TV,- if x £ Domx>{Nj) and V K ° > Nj. x fL DomxifK) 

[ loc(x) otherwise. 

Supp :=Supp U Supp N . 

Example 2. Consider again our example a Field axioms, which we have trans¬ 
formed into the structuring (3) (p. [6]). On the last structuring (3) we can ap¬ 
ply the factorization rule to extract the general abelian group axioms ('Pag := 
\/x, y, z . x o(y o z) = (xoy)oz,\/x,y.xoy = yox,\/x.xoe = x,\/x.xo i(x) = e) 
and obtain the respective axioms for + and x by morphisms a\ := o i—> +, e i —> 
0,i H > — and 02 := o 1 —> x,e £ l,i H inv. This is illustrated in the following 
diagram and the final structuring contains 5 axioms and the initial structuring 
contained 9 axioms. 
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The factorization rule only covers a sufficient criterion demanding that each 
theory imported by a definition link to one specification is also imported via 
definition links by all other specifications. The more complex case in which a 
theory is imported via a path of links can be handled by allowing one to shortcut 
a path in a single global link. This results in the following rule. 

Definitions (Transitive Enrichment). Let S = ((Af,£) ,loc, Supp) be a 
structuring of (£, Ax, Lem), K, N £ Af and there is a path K 3 a > N between 

both. Then, S' = ((Af,CL) {I\ > N}), loc, Supp) is a transitive enrichment 

ofV. 

Definition links in a development graph can be redundant, if there are al¬ 
ternatives paths which have the same morphisms or if they are not used in any 
reachable node of the target. We formalize these notions as follows: 

Definition 9 (Removable Link). Let S = (V, loc, Supp) (V = (Af,£)) be a 
structuring of (S, Ax, Lem). Lett £ C andV' = (Af,£\{l}). I is removable from 
S and S' = (V, loc, Supp) is a reduction of S iff 

1. \/l' : M a > N. if V provides exclusively cr(e) from some e € Domx>(M) 
then e £ Domx>'(N) and l l'; 

2. Ve £ DomxiMM £ \V]. if loc(e )3 — <J ~^ M then there exists M’ £ \V] such 
that 

loc(e) 3 a > M'; 

3. \/(f> £ Lemx>- Supp(4>) C Domx>'{N) and\/Sig l f) c (N) C Domx>'{N). 

Theorem 1 (Structuring Preservation). LetS := D, loc, Supp) (D = (Af,C)) 
be a structuring of (E, Ax, Lem). Then 

1. every horizontal split of S wrt. some N £ Af and partitioning V of N, 

2. every vertical split of S wrt. some N £ Af and partitioning V of N, 

3. every factorization of S wrt. nodes M\,... M p £ Af, 
every transitive enrichment of S, and 

5. every reduction of S 
is a structuring of (£, Ax, Lem). 

The theorem follows from the soundness proofs for each rule given in Appendix[b] 
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4 Refactoring Process 

In order to evaluate the refactoring rules on real theories we have implemented 
the development graphs and the rules in Scah^and added support to read for¬ 
mulas in TSTP format [S] using the Java parser from [8j. The support mapping 
is given as an extra datastructure representing the information which formula 
has been used in the proof of a theorem. In the case of TSTP we extract that 
information from the files by using the names of the formulas. Since the TSTP 
format does not include signature declarations, we add declarations for all oc¬ 
curring symbols in a TSTP file in an initialization step. We used the untyped 
part of TSTP and hence the declarations only contain arity information but no 
types. 

The refactoring rules are parameterized over the theories and possibly the 
subsets of the local signature, axioms and lemmata to split over. To compute 
the parametric information we provided some basic heuristic tactics. Using the 
support mapping, we define that an axiom (resp. lemma) depends on a symbol 
declaration, if the symbol occurs in the axiom (resp. lemma) and a lemma de¬ 
pends on another axiom or lemma, if the latter is in its support mapping. A 
symbol declaration is always independent. This dependency relation induces a 
partial order on the local domain of each node in a development graph. 

Tactic for horizontal split. This rule requires the partitioning of the local sig¬ 
nature, axioms and lemmas for a given theory into independent parts such that 
given the same imports than the original node, each part is a valid theory and 
lemma independent of the other part. We implemented a heuristic that given a 
local domain of some node, searches for a largest subset which has a non-empty 
intersection of its occurring symbols and supporting axioms and lemmata. If 
such a set exists, the largest such set is used to split the theory horizontally into 
that set and the rest. 

Tactics for vertical split. The rule requires to find a subset of the local domain, 
which is independent of the rest and use it as the content of the lower theory. 
We implemented two heuristics to search for this subset. First, we consider all 
maximal elements wrt. the dependency relation and use that as content for the 
new upper theory constructed by vertical split. Second, we consider all minimal 
elements and use it as content for the lower theory constructed by vertical split. 
These two tactics allow one to incrementally split a theory into layered slices of 
the dependency relation. 

Tactic for factorization. This rule requires to find isomorphic subsets in two 
different theories to factorize over. The notion of isomorphism between formulas 
is very strict, as we only search for renamings. Furthermore, we extended the 
isomorphism to the support mapping such that lemmata can only be identified 
with isomorphic lemmata which supporting axioms and lemmata are also iso¬ 
morphic wrt. the same renaming. Thus, an axiom can never be factorized with a 

2 http://www.scala-lang.org/ 
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lemma and vice-versa. Even with that strict notion, computation of such subsets 
is already expensive. If the entire local domain of a given node is isomorphic to 
the local domain of the second node, both nodes are factorized according the 
definition of the factorization rule. If the identified subset in the first node does 
not cover the complete second node, we first try to split the second node to 
isolate the subset. To this end we first try to split the second node horizontally 
using the identified subset. If that fails, we first try to split vertically using the 
subset for the upper part and finally as the lower part. If one of these splittings 
was successful, the factorization is applied on the isolated part. Otherwise the 
factorization fails. 

In addition to these main tactics, we have implemented the tactics to delete 
superfluous links as well as deletion of empty nodes which technically corresponds 
to vertically merging the empty node with their importing theories. 

Automatic Procedure. In order to automate the theory formation process we have 
implemented the usual tacticals to describe more complex search behaviors. The 
tactic language is defined as follows starting from the basic tactics described 
above: 


T ::= Split.Horizontal \ Spiit-V erticallyMaximal \ SplitV erticallyMinimal 
| Factorize \ Remove Super fluousEmptyTheories 
j T * | T + | T; T | T on fail T 

The tactics take as argument a structuring and if they could be applied, 
return a new structuring and otherwise fail. The tacticals for as many as possible 
iteration (*), as many as possible but at least one (+) and sequencing (;) are 
standard. The tactical onfail executes the second tactic expression only if the 
first failed. Using this language we have implemented the following automatic 
procedure. The goal of the procedure is starting from an unstructured graph, 
i.e. a single theory containing all declarations, axioms and lemmata, to search 
for possibilities to factorize common patterns. Factorization is only possible if 
at least one application of the horizontal split rule was possible, which in turn 
may require the application of a preparatory vertical split. Following that initial 
part, we try to split further vertically using the maximal elements of the theory 
and finally removing the superfluous links and empty theories. Hence, the initial 
phase of the automation consists of 

inittac = ((SplitVerticallyMinimalEntries+\ Split.H orizontally*) 
onfail Split.H orizontally +); 

SplitVertically Maximal Entries*; 

RemoveSuper fluousEmptyTheories* 

That initialization tactic succeeds only if at least one vertical split or one hori¬ 
zontal split could be done. Following that, we start to factorize. If at least one 
factorization was possible, we first clean up the structuring by removing super¬ 
fluous links and empty theories before trying again to split vertically. The overall 
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Article 

binop_2.top.rated 
bintreel.top.rated 
cfuncdom.top.rated 
ff_siec.top.rated 
finsub_l.top.rated 
heine.top.rated 
membered.top.rated 
mssubf am.top.rated 
msualg_l.top.rated 
power.top.rated 
qc_langl.top.rated 
rsspace.top.rated 
setfam_l.top.rated 
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16 

84 
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83 

55 

/ 
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49 

/ 

48 

13 

/ 

13 
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/ 
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61 

/ 

61 

86 

/ 

85 

23 

/ 

23 

46 

/ 

45 

20 

/ 

20 

51 

/ 

48 

44 

/ 

44 


Reduction 

Timeout 

5% 

yes 

2% 

no 

2% 

no 

2% 

no 

2% 

no 

1% 

no 

38% 

no 

1% 

no 

2% 

no 

1% 

yes 

1% 

no 

2% 

no 

4% 

no 


Fig. 4. Factorization results on TSTP versions of the Mizar articles 


tactic is thus 

inittac ; ( Factorize +; RemoveSuperfluousEmptyTheories*\ 
SplitVertically Minimal Entries*)* 

5 Evaluation 

We have applied the factorization procedure presented in the previous section to 
TSTP versions of the Mizar library articles www .mizar. org ( which have been cre¬ 
ated by Joseph Urban and are available at http://www.cs.miami.edu/~tptp/ 
MizarTPTP/TPTPArticles/. This is a collection of 922 files in TSTP format 
(www.cs.miami.edu/~tptp/TSTP) where theorems are annotated by informa¬ 
tion which theorems and axioms have been used in their proofs. The files consist 
of the axioms and theorems of each article including all directly included articles, 
but without transitive expansion of all inclusions. Hence, the knowledge in each 
file is already quite tailored to the knowledge necessary to define the additional 
mathematical concepts and to enable the proofs of the theorems. We have run 
the procedure on all examples with a timeout of 5 minutes each. The environ¬ 
ment was a virtual machine with 4 virtual CPUs, 16GB RAM, under openSuSE 
12.2 64-bit, running on a host with 2 Intel Xeon Westmere E5620 QuadCore 
CPUs, 2,4GHz, 96GB RAM and VMware ESXi 4.1. 

For most articles no factorization has been found. However, there are 13 
articles where factorization was possible, which are presented in the table Fig. [5] 
The results are summarized in the following format: for each file we indicate 
in the Axioms column the number of axioms in the initial development graph 
and the final development graph. Analogously, the Theorems column indicates 
the number of theorems respectively in the initial and the final development 
graph. The Reduction column indicates how much the factorization reduced 
the overall number of axioms and theorems. The last column indicates if the 
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automatic procedure had terminated within the 5 
minutes time frame or timeout was reached. 

While reducing the number of axioms by factor¬ 
ization is already interesting in order to reduce the 
search space for automatic provers, reducing the 
number of theorems is more interesting as it means 
less theorems to prove. For all but one file where 
factorizations have been found, only axiom factor¬ 
ization have been found. However, in the article 
membered. top.rated obtained from the Mizar 
article m “ On the Sets Inhabited by Numbers” 
we could factorize 36 theorems into 16 theorems. 

On closer inspection this is not surprising because 
it concerned theorems about sets of reals, sets of 
rationals, sets of integers, sets of naturals and sets 
of complex numbers, all defined and proved accord¬ 
ing to the same schema. The resulting development 
graph is shown on the right side of Fig. [5j and 
the factor theory containing the 5 theorems, from 
which all others are obtained by renaming, is node 

9 in gray/orange. The factorization is visible via Fig. 5. Resulting DG 
the 5 outgoing edges towards node 11 which are annotated with the respective 
morphisms. 



6 Related Work and Conclusion 

Related to the structuring of theories, there is a large work on anti-unification, 
i.e. computing common generalizations of different formuala or theories (e.g. 
1270). The resulting structuring approach is primarily botton-up and driven 
by the pure existence of anti-unifiers. In contrast, our approach is top-down as it 
introduces measures for the intended structuring (i.e. semantic appropriateness, 
closure and minimality) to guide the formation process. For example, we split 
up theories in smaller ones but that are still self-contained in the sense that 
each theorem of the original theory can be proven in one of the new (smaller) 
ones. Anti-unification is an important technique to test the applicability of the 
factorization rule, for instance, but applicability of a rule is not the driving force 
of the formation process. 

In this paper we were concerned with trying to reveal shared definitions, 
axiomatizations and theorems in a given formal theory. Based on structurings 
which extend development graphs with notions to exclude redundancies and 
include dependency information, we presented a set of rules on structurings. We 
implemented the rules with simple heuristics to detect isomorphic subsets which 
are sufficient to find simple factorization and applied it to the TSTP formulations 
of the Mizar articles. Not surprisingly, not many factorizations could be found, 
which is due to Mizar’s non-transitive reuse principle of other articles and the fact 





























Structure Formation in Large Theories 


13 


that these were chosen carefully by the authors of the Mizar article. Moreover, 
the heuristics to compute isomorphic axioms and theorems was very restricted. 
However, a few factorizations could be found, and especially one were the number 
of theorems could be halved. This indicates that adding theory morphisms to the 
Mizar language may be useful, but that needs to be confirmed by further analysis 
of larger subsets. On the other hand the non-transitive import mechanisms of 
Mizar already seems to allow for a good organization of the knowledge. That 
kind of mechanism is typically not implemented in specification languages, but 
exists in development graphs in form of local axiom links. 

Future work will consist of analyzing larger subsets of the whole Mizar li¬ 
brary, i.e. sets of Mizar articles, for possible factorizations. We also plan to 
apply it to libraries of other proof assistants assuming we can get the depen¬ 
dency information which axioms/theorems have been used in which proof. Also 
other automation tactics and especially heuristics to identify isomorphic formu¬ 
las need to be explored, as well as heuristics to identify subsets for horizontal and 
vertical splits. On a more theoretical level, we will investigate how axioms and 
theorems could be identified, in order to allow to factorize alternative axiomati- 
zations of the same theory without losing information, such as, e.g., alternative 
forms to axiomatize groups. Finally, the whole system can be applied to any un¬ 
typed first-order subset of TPTP theories to search for redundancies. However, 
the resulting development graphs cannot be saved as TPTP theories, as it does 
not support renaming. Hence, we propose to extend the TPTP language in that 
respect. 
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Proof of Theorem [l] (Structure Preservation) 

Horizontal Split 

It holds trivially that Domp = Domp'. 

— loc' is surjective because by construction each Ni, i = 1,..., k has a local 
entity. Furthermore, for each Ni and each e £ dom Ni holds loc (e) = Ni by 
construction. Furthermore, since loc~ 1 (N) = dom N , none of the incoming 
links into N provided any entity, and consequently none of the incoming 
links into N±,...,Nk do. Hence, loc'^ 1 (Ni) = dom Ni , i = 1,2 and since 
dom N := dom Nl l±l... l±l dom Nk , loc (e) is unique for e £ dom N . 

— If N is not a top-level node in T>, then Dom^pn = Dom^pi = ZJ l±) Axti) Lem 
because the domains of nodes reachable from N are not affected by the 
horizontal split. If N is a top-level node, then all N t with 1 < i < k are 
top-level nodes. Since dom N = dom Nl l±) ... I±l dom Nk and Importsp(N) = 
Importsp,(Ni) = ... = Imports v ,(Nk), it holds 

Domp(N) = dom N U Import.Sp(N) = dom Nl U ... dom Nk U Importsp(N) 
= dom Nl U ... dom Nk U Importsp, (Ni) U ... U Importsp, (N^) 

= dom Nl U Importsp, (Ni ) U ... U dom Nk U Importsp, (Nk) 

= Dompi (Ni) U ... U Domp'(Nk) 

Thus, Dom^pi-\ = Dom^pi = S l±) Axti) Lem. 

— Assume <f> £ Lemp and if £ Supp(<f). If locp(if) ^ N and locp(<f) ^ N, then 

both locp(if), locp(4 >) are in V and we consider p : locp(if)l a > locp(<f). If 
N £ p then p := [pi, M 8 > N T > M' ,p 2 \ and by construction the path 

a T \ Dom N . 

[pi, M > N t > l M' ,p 2 \ are in V for 1 < i < k. Since locp(if) ^ N, 

each T|£> omjv behaves equivalently on the image of if imported in N t and 
/ 

hence locp'(if) %F=^0i oc T>'(<f) for some a' such that cr'(if) = <j(if). If N qL p, 

then p is also a path in V and locp> (if) J==> locp / (cf>) holds trivially. 

If locp(cf>) = N then since all Ni are mutually lemma independent, without 
loss of generality we can assume (f £ aaA 1 U lem Nl and this loc'p,(<f>) = N±. 
If locp(if) = N, then if' £ aaA 1 U lem Nl because Ni is lemma independent. 

Thus, locp,(if) = Ni and loc v ,(if) = N\ 1 ld > N± = loc' v ,((f) holds trivially. 
Otherwise, locp(if) = loc’p,(if) and since N was reachable from locp(if) by 
construction Ni is also reachable from locp,(if). 
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Vertical Split 

— First, we have to prove that loc is a location mapping, loc is surjective 
because by construction each node TV* (with i = 1,2) has some local entity 
e £ dom Ni . Thus loc (e) = N and TV \ is in the range of loc . Furthermore, 
Ve £ dom Ni . loc (e ) = TVj holds by definition. Finally, let e £ Domx>' = 
Domx >: loc' (e) = Ni implies loc(e) = N and therefore there is no node in 

A f\ {TV} which provides e. Furthermore, since Ni - * -> N% £ C, N\ and 
N 2 cannot provide the same entity e. 

— By definition Ve £ dom Ni implies loc' (e) = ATj for i = 1,2 in V. For all 
other nodes in V\{Ni, No} the property is inherited by (2?, loc , Supp) being 
a structuring and loc(e) = loc (e) if loc(e) ^ N. 

— Since Domv{N) = Homxv (AV) and A) a > M £ V iff Afe 3 = > M £ V 
Dom^xi] = dom^T 

— Suppose (p £ Lem-D,ip G Supp(cp) with loc(<j >) = M and loc(ip) = ML If V ^ 
{M,M'} then loc {(j>) = M, loc (ip) = M' and M b g > M' in I?' trivially. 

If M = N and M' ^ N then loc {(j>) £ {Ni, A^}, and again TV, 3 J > M' in 
V. The case of M ^ N and M' = N is proven analogously. We are left with 
the case of M = M' = N. 

Since Ni is independent of N 2 , it holds that for all (j)' £ ax Nl Ulem Nl . Supp{(j>)C] 
(aaA 2 U lem N2 ) = 0. 

Thus (p £ ax Nl UTem^ 1 implies that ip £ ax Nl Ulem Nl as well and N\ 3 ld > Ni 
holds trivially. □ 

Factorization 

— We have to prove that loc is a location mapping. First, we prove that loc 
is surjective. For any node I\ £ J\f' \ {N, N x ,... N p } /oc _1 ( K) = loc~ 1 (K) 
holds. Since sig N U aaA ^ 0 but ( sig** U ax N ) n Domx> = 0 it holds that 
sig N U ax N C loc'^ 1 (N). Furthermore, sig Mj U ax AIj C loc'~ l {Nj) since 

U ax Mj C Oj(sic^ U ax N ) and 9j{si^ U ax N ) n (. sig N U ax N ) = 0. 
Second we have to prove \/K £ A f. Ve £ dom K . loc (e) = K holds. If 
K {AT, Ni,... N p } then loc (e) = loc(e) = K. li K = N then dom N £ 
Domx>'(N) and dom N pL Dom-niKi) for i = 1 ,...,n because dom N (~l 
Domx> = 0. Thus Ve £ dom N . loc'V'(e) = N. Finally, if K = Nj then 
dom Nj = lem Mj \ 9j (lent) In particular, dom Nj (~l Domx>'(N) = 0 implying 
that loc'V(e) = Nj for all e £ dom Nj . 

Third, we prove that all e £ Douidg' are provided by a unique node. The 
only interesting case is that e is provided by N or some Nj. In case of N both 
dom N and also entries provided by some link from Ki are by definition not 
in Doruj) and thus not provided by any node already in T> but by definition 
also not provided by Nj. It remains the case that an entry e is provided by 
two nodes ATj and Nj. Since all e £ Dotudg were provided by a unique node, 
this implies that e has to be a mapped lemma of N but that violates the 
precondition that each 9i has to map e into a different entity. 
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— Next we prove that T> and T>' coincide in the entities they provide at their 
maximal nodes. Since N is not a maximal node, it is sufficient to prove that 
Nj and Mj coincide in their provided entities: 

Dom-n' (Nj) = lem Mj \ Oj(lem) U {J{a{Dom v >{K)) \ K — Nj} 

= lem Mj \ Oj(lem) U \J{a{Dom DI {K)) \ K Nj, K £ N} 

U Oj(sig) U dj(ax) U dj(lem) U u (Ti,j(Dom D (K.ij))\i = 1 ...n} 

= lem Mj U sig Mj U ax Mj 

uU a(Dom v (K)) \ I< M v K ± K u a ± a it j} 
u U{ <jij(DomTi(Ki t j)) | i = l...n} U dj(lem) 

= Domz> {Mj ) U Qj ( lem ). 

— Suppose (j) € Lem-pi and ip £ Supp^, (cp). If loc {(p), loc' {ip ) ^ { N , N\,... N p } 

then loc {(p) = loc{(p) and loc {ip) = loc{ip ) and therefore, 3cr. loc{ip)^=^> loc{<p) 
with cr{ip) = ip in V. Since V inherits all links away from Mi, ... M p and 
paths travesing some Ki and M :] can be mapped to paths traversing Ki, 

N, and Nj. 3er. loc {ip) 3- a > loc {(p ) with cr{ip) = ip also in V'- Next, let 
loc'{(p) = Nj: by definition we know that (p £ Mj and Supp{<p) C Domx>{Mj). 
Since Domx>{Mj) C Domx>'{Nj) we know that Supp (</>) = Supp{<p ) C 

Dom-D'{Nj) and thus \hp £ Supp {(p ). loc {ip) D = a > Nj with cr{ip) = ip. Fi¬ 
nally, let loc (</>) = N. Then Supp N C Supp is a support mapping for <p in 
particular. 

Transitive enrichment 

Obviously, the inclusion of the global link does not affect the visibility (e.g. 
Dom ) of any node in A f nor the local entities provided by the individual nodes 
(i.e. dom). Hence, all properties of a structuring are trivially forwarded to the 
enriched structuring. 

Removable link 

— We have to prove that loc is also a location mapping for V. It holds that 
V1V £ J\f. Ioct){N) = locx>'{N) since dom{N) remains unchanged and also 
all e £ locx>{N) that are exclusively provided by some link in T> are still 
provided exclusively in V. Thus, loc is also surjective in V , also MN £ 
TV.Ve € dom N . locD'{e) = locV{e) = N and Ve £ Domv ■ locV {e ) is the 
only node providing e. 

— D' and T>' coincide in the entities they provide at their maximal nodes, which 
is an immediate consequence of condition (2) of Def. [9j 

— Also \/(p £ Lem-D' ■ V ip £ Supp{<p). 3a. loc{ip)^==> loc{<p) A a{ip) = ip is 
implied by condition (3) of Def. [9] 

□ 
















